Web Search Gemo-Lamsade

ثبت نشده
چکیده

With a constantly increasing size of billions of freely accessible documents, one of the major issues raised by the World Wide Web is that of searching in an effective and efficient way through these documents to find these that best suit a user's need. The purpose of the chapter is to describe the techniques that are at the core of today's search engines (such as Google 1 , Yahoo! 2 , Microsoft Live Search 3 or Exalead 4), that is, mostly keyword search in collections of text documents. We also briefly touch upon other techniques and research issues that may be of importance in next-generation search engines. This chapter is organized as follows: in Section 1, we discuss the Web and the languages and protocols it relies upon. We then present in Section 2 the techniques that can be used to retrieve pages from the Web, that is, to crawl it. First-generation search engines, exemplified by Altavista 5 relied mostly on the classical information retrieval (IR) techniques that are described in Section 3. With the advent of Google, other techniques that make use of the graph structure of the Web (see Section 4) have very effectively complemented text IR. We then proceed to a brief discussion of currently active research topics about the Web in Section 5. Whereas the Internet is a physical network of computers (or hosts) connected to each other from all around the world, the World Wide Web, WWW or Web in short, is a logical collection of hyperlinked documents shared by the hosts of this network. An hyperlinked document is just a document with references to other documents of the same collection. Note that documents of the Web may both refer to static documents stored on the hard drive of some host of the Internet, and to dynamic documents that are generated on the fly when accessing the document. This means that there is a virtually unlimited number of documents on the Web, since dynamic documents can change on each request. When one speaks of the Web, it is mostly about the public part of the Web, which is freely accessible, but there are also various private Webs that are restricted to some community or company, either on private Intranets or on the Internet, with password-protected pages. Documents, and, more generally, resources on the Web, are identified by a URL (Uniform Resource Locator) which is a …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Information Management with XML and Web Services

XML andWeb services are revolutioning the automatic management of distributed information, somewhat in the same way HTML, Web browser and search engines modified human access to world wide information. To illustrate, we present Active XML that is based on embedding Web service calls inside XML documents. We mention two particular applications that take advantage of this new technology and novel...

متن کامل

Running Time Analysis of Evolutionary Algorithms on Vector-Valued Pseudo-Boolean Functions

This paper presents a rigorous running time analysis of evolutionary algorithms on pseudo-Boolean multiobjective optimization problems. We propose and analyze di erent population-based algorithms, the simple evolutionary multiobjective optimizer SEMO and two improved versions, FEMO and GEMO. The analysis is carried out on two bi-objective model problems, LOTZ (Leading Ones Trailing Zeroes) and ...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Analysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type

Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...

متن کامل

Four di-Cu(II)-substituted sandwich-type germanomolybdates obtained under different reaction conditions: from zero-dimensional to two-dimensional structure.

Four di-Cu(II)-substituted sandwich-type germanomolybdates, (H(2)en)(2)H(7){[Na(0.5)(H(2)O)(3.5)](2)[Cu(2)(β-Y-GeMo(9)O(33))(2)]}·6H(2)O (1), (H(2)en)(2)H{[Na(2.5)(H(2)O)(12)](2)[Cu(en)(2)][Cu(2)(β-Y-GeMo(9)O(33))(2)]}·8H(2)O (2), [Na(4)(H(2)O)(12)](2)H(4)[Cu(2)(β-Y-GeMo(9)O(33))(2)]}·11H(2)O (3) and [Cu(en)(2)](2)[Cu(en)(2)(H(2)O)](2){[Cu(en)(2)](2)[Cu(2)(β-Y-GeMo(9)O(33))(2)]}·8H(2)O (4) (en ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008